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ABSTRACT 

We present two methods for determining spectroscopic redshifts of galaxies in the deep2 survey 
which display only one identifiable feature, an emission line, in the observed spectrum ( "single- line 
galaxies"). First, we assume each single line is one of the four brightest lines accessible to deep2: Ha, 
[O in] A5007, H/3, or [O il] A3727. Then, we supplement spectral information with BRI photometry. 
The first method, parameter space proximity (PSP), calculates the distance of a single-line galaxy to 
galaxies of known redshift in (B — R), (R — I), R, A bservcd parameter space. The second method is an 
artificial neural network (ANN). Prior information, such as allowable line widths and ratios, rules out 
one or more of the four lines for some galaxies in both methods. Based on analyses of evaluation sets, 
both methods are nearly perfect at identifying blended [O n] doublets. Of the lines identified as Ha in 
the PSP and ANN methods, 91.4% and 94.2% respectively are accurate. Although the methods are 
not this accurate at discriminating between [O in] and H/3, they can identify a single line as one of the 
two, and the ANN method in particular unambiguously identifies many [O in] lines. From a sample 
of 640 single-line spectra, the methods determine the identities of 401 (62.7%) and 472 (73.8%) single 
lines, respectively, at accuracies similar to those found in the evaluation sets. 
Subject headings: galaxies: distances and redshifts — line: identification 



1. INTRODUCTION 

Photometric redshifts (photo-zs) save astronomers 
from expensive spectroscopy by determining redshifts 
from efficient broadband photometry. However, photo- z 
preci sion cannot compare to sp ectroscopic redshift preci- 
sion. IWav fc Sr ivastava (2006) compare five state-of-the- 
art methods to determine redshifts from Sloan Digital 
Sky Survey (SDSS) photometry. Neural networks, which 
are non-linear regression tools, perform the best, but the 
rms error in the photo-zs is Sz/(l + z) ~ 0.02 at best. 
This precision is sufficient to study large scale structure 
and some fo rms of redshift evo lution, but not local en- 
vironments (|Cooper et a l. 2005), kinematic pairs, or the 
low-redshift luminosity function, where precise luminosi- 
ties require precise redshifts. At a spectral resolution 
R = 5000, the spectroscopic redshifts in the Deep Ex- 
tragalac tic Evolutionary P robe (deep2) Galaxy Redshift 
Survey ([Davis et al.ll20d3l ) are many times more precise. 
Repeat observations show redshift errors of Sz ~ 10~ 4 , 
or velocity errors of Sv ~ 30 km/s. 

The expense of time-intensive spectroscopy demands a 
high rate of successfully determined redshifts. Nonethe- 
less, some spectra fail at providing redshifts. One of the 
most common redshift failures is the presence of only 
one emission line. Ordinarily, recognizable patterns of 
emission or absorption lines uniquely determine the lines' 
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identities and hence rest wavelengths. Spectroscopically 
precise observed wavelengths then give highly precise 
redshifts. An isolated emission line forms no recogniz- 
able pattern to reveal its identity and rest wavelength. 

Recovering failed spectroscopic redshifts with broad- 
band ph otometry has hardl y been explored in the lit- 
erature. ICohen et all (|1999h assume all single emission 
lines in the Caltech Faint Galaxy Redshift Survey (CF- 
GRS) are [O n] A3727 by arguing that any other line ex- 
cept Lyq would b e accompanied by other emission lines. 
iLilly et al.l (| 1995( 1 have identified single emission lines in 
Canada-France Redshift Survey (CFRS) spectra based 
on the slope of the continuum surrounding the line. (Un- 
like deep 2, both CFGRS and CFRS did not have the 
spectroscopic resolution to resolve the [On] doublet, re- 
quiring them to make assumptions about the identities 
of single lines.) 

In this paper, we also rely on the shape of the con- 
tinuum, but we will show that broadband BRI colors 
can determine the identity of single emission lines ac- 
curately, even for galaxies with no visible spectral con- 
tinuum. The problem of an isolated emission line afflicts 
~ 1.5% of deep 2 targets, but the problem is more preva- 
lent for serendipitously detected galaxies ( "serendips" ) 
that share a slit with target galaxies through a fortuitous 
position on the sky. Recovery of redshifts from serendip 
spectra is important because serendips form an unbiased 
sample of galaxy spectra. 
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We explore the problem through two methods. The 
parameter space proximity (PSP) method identifies the 
rcdshift of a single-line galaxy by comparing its photo- 
metric magnitude and colors with those of galaxies with 
known spectroscopic redshifts. The artificial neural net- 
work (ANN) method employs the ANN machine learning 
algorithm, which learns the functional relationship be- 
tween any number of dependent and independent vari- 
ables. They can determine photometric redshifts from 
relevant observable quantities such as colors, apparent 
magnitudes, and angular sizes. Here, we use apparent 
magnitudes (from which the ANN determines colors) and 
observed wavelengths of the single emission lines. 

This paper is organized as follows: §[5] describes the 
deep2 survey and how single emission lines arise in its 
spectra; §[3] describes the two methods for determining 
the identities of these single lines; §0] examines the accu- 
racy of each method; §[5] presents the results of applying 
the methods to single emission lines; and §|6] summarizes 
our work and discusses our future plans. 

2. DATA 
2.1. DEEP2 Survey 

The deep 2 Survey (outlined by iDavis et all l2003h 
combines BRI photometry in four fields (described by 
ICoil et al.ll200l from the cfht 12k x 8k mosaic camera 
and spectroscopy (described by Newman et al. 2007, in 
prepa ration) from the DEIMOS sp ectrograph (iFaber et alJ 
120031 ) on the Keck n telescope. IDavis et al.1 (|2005| ) de- 
tail the deep2 target selection, based on the three-filter 
(B, R, I) photometry. A color-color cut is applied to 
spectroscopy candidates in three of the four 120 arcmin 2 
deep2 fields to reduce as much as possible the number of 
spectra of z < 0.7 galaxies. The remaining field (Field 1) 
has no such color-color selection. An apparent magnitude 
cut of i?AB < 24.1 is applied to all fields. Furthermore, 
each spectroscopy target must be detected in B, R, and 
/. The spec2d and spec Id software packages, written by 
the deep2 team, accomplish the deimos spectroscopy re- 
duction, including sky subtraction and an instrumental 
throughput correction to preserve actual line strengths. 

Field 1 is unique in other ways as well. It overlaps 
the Extended Groth Strip (EGS), which many different 
teams and instruments observe heavily. The wealth of 
photometry for the DEEP2 targets in Field 1 permits very 
good photo-z estimates (Huang et al. 2007, in prepara- 
tion) . We plan to make use of the additional EGS data 
in future work. However, in this paper, we use only the 
BRI photometry in Field 1, and we treat it no differently 
from Fields 2-4. 

Astronomers from the deep2 team visually inspect ev- 
ery spectrum using the zspec script from the spec Id 
software package, designed by DEEP2 members. They as- 
sign spectra with at least two strong identifiable lines — 
such as an unblended [On] doublet — a redshift quality 
code Q = 4. Spectra with one strong line and at least 
one weaker line receive Q = 3. The entire survey contains 
49,059 spectra. Of them, 27,460 and 5,829 are Q = 4 
and 3 respectively. The Q = 2 category encompasses 
all failed redshifts that may be recovered with additional 
effort. The inspectors note the reason for all failures in 
this category, including the absence of all but one line. 



2.2. Single Emission Line Galaxies 

The spectral resolution of a deep2 spectrum is 1.4 A 
FWHM, and the typical spectral range is about 6600 A to 
9200 A. Therefore, few redshifts permit both Ha A6563 
and [O III] A5007 or H/3 A4861 to fall on the same spec- 
trum. Similarly, few redshifts permit both [O in] or H/3 
and [On] A3727. Consequently, most deep2 galaxies 
which are too faint to have a weaker emission line or a 
continuum with noticeable absorption lines display only 
one visible feature: Ha, [Oin], H/3, or [On] in emission. 
This feature is often the [Oil] doublet, which is easily 
identifiable by its invariable 220 km/s peak separation 
as long as the galaxy's internal gas velocity dispersion 
does not broaden and blend both peaks. However, the 
feature may be a truly isolated single line, in which case 
the spectrum cannot yield a unique redshift. Lines may 
appear to be orphans if the signal-to-noise is low enough 
to permit only the brightest line to be seen. Other weak 
lines may be lost in the noise of night sky lines, even with 
good sky subtraction. Finally, a gap of ~ 5 A separates 
the red and blue CCDs in deimos, and any line in a pair 
of otherwise visible lines that falls completely in the gap 
will orphan its partner. 

We select all 984 redshift failures that result from a 
single emission line. After an additional visual inspec- 
tion to remove spectra with marginally detected lines, 
spectra mistakenly marked as having a single line, and 
spectra without any visible emission lines, our sample 
contains 640 single-line emission galaxies. We identify 
the pixel within the line that contains the most counts 
in the spectrum smoothed through a Gaussian window 
function with a = 4 pixels = 1.3 A and inverse vari- 
ance weighting. We designate that pixel's observed wave- 
length as the wavelength of the single line. Finally, we 
note whether the line is broad enough to be a blended 
[On] doublet. For the purposes of identifying lines, it 
is not necessary to be more precise in determining the 
observed wavelength of the [O n] doublet than selecting 
the pixel with the most counts. 

It is worth mentioning that the line identification meth- 
ods presented here will fail at identifying the redshifts 
of composite galaxies, such as those blended together 
through lensing or line-of-sight coincidence. We assume 
that the broadband colors and magnitudes associated 
with all of the lines identified come from a single galaxy, 
and that composite colors and magnitudes will be dis- 
tinct enough in parameter space that their associated 
single lines will not receive a conclusive identification. 

3. METHODS 

The bright lines most often visible in deep2 spectra 
are Ha A6563, [Om] A5007, H/3 A4861, and the [On] 
AA3726, 3729 doublet. If any other line is visible, then 
one of these four is almost always visible as well. The B- 
band detection requirement eliminates the possibility of 
Lya. On rare occasions, bright sky lines or the deimos 
CCD gap may hide one of these four lines, leaving only 
one other dimmer line, such as [O in] A4959 or H7 A4341, 
visible. For this paper, we assume that any single emis- 
sion line is one of the four bright lines. 

If a line is observed at A„, then we assume the galaxy's 
rcdshift is one of zh« = A Q /6563 A — 1, zom = 
A o /5007 A - 1, z H/3 = A /4861 A - 1, or z Qll = 
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A /3727 A — 1. We employ two independent methods, 
described below, to assign probabilities to the four possi- 
ble line identities or redshifts: Pho, Pom, Pa/3, and Pern- 
Additionally, we set to zero the probabilities of lines that 
satisfy the following conditions: 

1. If the line is observed bluer than 0.98Ah q = 
6431.5 A, then Ph q = 0. In other words, we do 
not permit blueshifts z < —0.02. 

2. Of the Q = 3 and Q = 4 deep2 galaxies at red- 
shifts where both Ha and H/3 are accessible, 80% 
exhibit a Balmer decrement of 2.6 < Ha/H/3 < 7.7 
where the unit is spectral counts at the line peak. 
Therefore, if the line is assumed to be Ha, and H/3 
also falls within the deimos slit's spectral range, 
then Ph q = if the smoothed spectral counts at 
the location of the single line are less than 2.6 or 
greater than 7.7 times the counts at the location of 
H/3 within the errors of the photon counting statis- 
tics. 

3. Similarly, 80% of Q = 3 and Q = 4 deep2 spectra 
with both [O in] lines exhibit an [O in] doublet ratio 
of 1.6 < [Om] A5007/[Om] A4959 < 5.0. There- 
fore, if the line is assumed to be [O in] A5007 and 
[Om] A4959 should also be visible, then Pom = 
if the counts at the location of [Om] A5007 are 
less than 1.6 or greater than 5.0 times the counts 
at the location of [Om] A4959 within the errors 
of the photon counting statistics. The true phys- 
i cal ratio is always [Q nil A 5007/fOin1 A4959 = 3 
([Osterbrock fc Ferlandl [20061 ). 

4. In 80% of Q = 3 and Q = 4 deep2 spectra with 
both [Om] A5007 and H/3, 0.49 < [Om]/H/3 < 4.2. 
Therefore, P 0ln = if [O ill] /H/3 > 4.2 within the 
errors of the photon counting statistics. 

5. Following condition (2), P H/3 = if H/3/Ha > 0.34 
within the errors of the photon counting statistics. 

6. Following condition (4), P H/3 = if H/3/ [O III] > 1.6 
within the errors of the photon counting statistics. 

7. If the line drops to no visible counts within a win- 
dow smaller than 220 km/s, the velocity separation 
of the [O n] doublet, then Pon = because the line 
cannot be dispersion-blended [O n] . 

Conditions (2) through (6) rely on an 80% confidence 
interval, which may seem strict, but the large majority 
of the typically low signal-to-noise single lines pass these 
tests by virtue of their large photon counting errors. 

After all priors have been applied, we normalize the 
remaining probabilities P such that their sum is unity. 

In the language of photo-zs, both of the following tech- 
niques are "training set" methods, or empirical calibra- 
tions. Training methods are immune to improper mod- 
eling and even incorrect photometric zero-point offsets. 
One weakness of training methods is that they require 
large training sets for their accuracy and precision to 
compare to that of modeling methods. 

Another weakness is that they assume that the sample 
population properties are similar to the training set prop- 
erties. For example, if single-line galaxies preferentially 



have lower metallicity and bluer colors than training set 
galaxies, then the methods described here will search a 
skewed region of parameter space. Furthermore, a larger 
fraction of single-line galaxies show little or no contin- 
uum than training set galaxies, which may skew colors. 
For this paper, we assume that the photometric proper- 
ties of single-line galaxies are a subset of the photomet- 
ric properties of training set galaxies, meaning that the 
parameter space around each single-line galaxy is well- 
populated by training set galaxies at similar redshifts. 

3.1. Known-redshift sets 

Both methods which assign probabilities to each of 
the four major lines require sets of galaxies with well- 
measured redshifts. The known-redshift set consists of 
deep2 targets with quality Q = 3 or 4 spectroscopic red- 
shifts and with at least one emission line detected at least 
5(7 above the noise. There are 20,676 such galaxies. We 
identify the emission line containing the pixel with the 
maximum counts in the spectrum smoothed in the same 
manner as the single-line galaxy spectra in § 12.21 For the 
remainder of the training process, we treat the spectrum 
as containing only that single line, but the redshift — and 
hence the identity of the single line — is known. 

Inaccurate redshifts in the known-redshift set of course 
lead to inaccurate training and spurious line identifica- 
tion. Repeat observations have shown that fewer than 
4.5% of the 2,061 Q = 3 redshifts and fewer than 0.5% 
of the 18,615 Q = 4 redshifts are incorrect. Therefore, 
the known-redshift set is less than 2% contaminated by 
incorrect redshifts. 

Both of the following methods assume that the train- 
ing set galaxies populate the same observable parameter 
space as true single-line galaxies. The different target 
selection functions between Field 1 (6,532 galaxies) and 
Fields 2-4 (14,144 galaxies) make this point especially 
important. The underrepresentation of z < 0.7 galax- 
ies, which are present almost exclusively in Field 1, de- 
creases the certainty at which z < 0.7 galaxies may be 
identified. However, as long as z < 0.7 single-line galax- 
ies populate a region of parameter space distinct from 
z > 0.7 galaxies, and as long as z < 0.7 training set 
galaxies also populate that region, the following algo- 
rithms should not mistake low-redshift galaxies for high- 
rcdshift ones. Therefore, we combine galaxies from all 
fields into one training set. We have also analyzed both 
methods with field segregation. We tested Field 1 galax- 
ies using a training set with only Field 1 galaxies, and we 
tested Fields 2-4 galaxies using a training set with only 
Fields 2-4 galaxies. The accuracy was statistically indif- 
ferent from using a unified training set, but the number 
of conclusively identified galaxies decreased slightly. 

3.2. Parameter space proximity method 

We randomly divide the known-redshift set into train- 
ing and evaluation subsets. The latter plays no role in 
the training, and we invoke it only in §[H A larger train- 
ing set yields higher precision whereas a larger evaluation 
set provides more confident tests of the method. We find 
that diverting 20% of the known-redshift set into the 
evaluation set, leaving 80% for the training set, gives a 
significant, untouched sample by which to judge perfor- 
mance without significantly affecting precision. 
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The survey's three available broadband filter measure- 
ments, B, R, and /, permit photo-z measurements. Be- 
cause emission line galaxies in different redshift ranges 
form different loci in (B — R), (R — I), R, A bscrved space, 
a single galaxy's position in that plane gives a guess at 
its redshift. The two colors alone provide enough in- 
formation to determine some line identities, but adding 
R apparent magnitude increases the number of identifi- 
cations by ~ 15%. Observed wavelength also provides 
discriminatory power because both the single line's ob- 
served wavelength and the galaxy's broadband colors are 
functions of redshift. 

We assign an identification confidence parameter P 3 X 
for each single-line galaxy, represented by index j. X 
represents one of the bright lines. Roughly, the confi- 
dence parameter is a measure of distances from the point 
of the sample galaxy to each point in the training set in 
parameter space. More or closer points give a higher P x , 
and fewer or farther points give a lower PL. See Fig. [T] 
for a simplified representation of this method. 

More precisely, every point in the training set is as- 
signed a four-dimensional Gaussian. Each of the four 
axes corresponds to one observable: B — R, R — /, R, 
and Aobscrved- The width of the Gaussian for each of 
the three color and magnitude observables is given by 
the sum in quadrature of the photometric error for the 
training set galaxy i and the photometric error of the 
single-line galaxy j. The wavelength Gaussian is unique 
because it distinguishes between the four line identities. 
Each galaxy in the training set has four wavelength Gaus- 
sians, each corresponding to one of the four possible line 
identities, represented by X. The wavelength at the peak 
of Gaussian X is (I+2) Ax, where z is the known redshift 
of the training set galaxy and Ax is the rest wavelength 
of one of the four lines. The width of this Gaussian is 
the sum in quadrature of 5\ — lOOO^zAjf , where Sz is 
the error in the redshift from the spec Id spectral tem- 
plate cross-correlation, and 106 A, the median value of 
S\ for the evaluation set. The factor of 1000 is necessary 
to widen the Gaussians so that they actually overlap. 
It was chosen to optimize accuracy, but the results are 
very insensitive to the precise value. P x is the sum of 
the values of all of these Gaussians corresponding to line 
X at the point where the single-line galaxy j lies in the 
four-dimensional parameter space. In symbols, 
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al^(5R t f + (5R 3 ) 2 (4) 

a\ = (1000 Szi Ax) 2 + (106 A) 2 (5) 

Although colors may be determined more precisely than 
Eqs. [5]and suggest, we use the error simply to broaden 
the Gaussians. Additionally, the Gaussians of galaxies 



with larger photometric errors will contribute less to P x 
because they will have larger widths but the same vol- 
ume. 

The confidence parameters are subjected to the prior 
conditions described above and normalized such that 
^2 X Px = 1 • This normalization makes the Gaussian 
prefactor of (2ir)~ 2 unnecessary. Thus, P x represents 
the probability that the identity of the single emission 
line in spectrum j is X. 

The criterion for a conclusive line identification is that 
P 3 X for a certain line X exceeds ppsp, a tunable param- 
eter. A larger ppsp will yield fewer conclusive line iden- 
tifications, but a larger fraction of them will be correct. 2 

Because [O m] A5007 and H/3 A4861 always fall so close 
to each other in wavelength, it is very difficult to discrim- 
inate between them. However, it is possible to rule out 
Ha and [O n] if Pq m + P^ > ppsp ■ Additionally, condi- 
tions (3) through (6) described above may rule out either 
[O in] A5007 or H/?, leaving only one possibility. 

3.3. Artificial neural network method 

An artificial neural network (ANN) can learn the func- 
tional relationship between certain elements in a data set 
and derived properties of the same set. The first major 
implementation of ANNs in astronomy was galaxy mor - 
phological classification (|Storrie-Lombardi et al.lfl992T ). 
but presently, the most common ANN implementa- 
tion is photometri c rcdshifts (e.g., iFirth et al.l 120031 : 
IVanzella et aUl200l . ANNs are more precise than other 
machine l earning methods (e .g., trained decision tree 
classifiers. ISuchkov et al.|[2005l ) and model- in dependent , 
in contrast to template fit ting methods (e.g.. ICoe et al.l 
120061 : iBrodwin et al.l 120061 ). Given a set of photometric 
data, such as broadband filter measurements, an ANN 
can estimate a redshift and the error on that redshift. 
This process requires a very large training set for ac- 
curate and precise results. Typically, a large known- 
rcdshift set with broadband data and spectroscopic red- 
shifts is divided randomly into three independent sets: 
training, validation, and evaluation. (In this paper, we 
divide the known-redshift set into 60% training, 20% val- 
idation, and 20% evaluation to balance the precision a 
large training set affords with the ability to test the ANN 
on an evaluation set.) The ANN learns the dependence 
of redshift on photometric observables from the training 
set and interactively verifies its accuracy with the valida- 
tion set. After training and validation are complete, the 
ANN configuration is fixed and unchangeable. At this 
point, the evaluation set can test how well the ANN has 
been trained (see §0]). Finally, the ANN may be applied 
to data without spectroscopic redshifts to obtain pho- 
tometric redshifts with an accuracy comparable to that 
achieved with the evaluation set. 

In determining single emission line identity, we em- 
ploy an ANN with a four-com ponent output, using th e 
publicly available code ANNz l|Collister fc Lahavl[200l) . 
Each output corresponds to the probability of a line iden- 
tification. During training, the brightest line in the spec- 

2 When referring to line identification, "accuracy" in this paper 
means identifying lines correctly. "Precision" means the confidence 
with which a line is identified, or its value of Px- Therefore, in- 
creasing ppsp imposes a stricter condition on precision, thereby 
increasing accuracy. 
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Fig. 1. — Left: A simplified representation of the PSP method, with the four-dimensional parameter spa ce projected onto the BRI 
color-color plane. Training set galaxies are color-coded by the values of their wavelength Gaussians (see § 13.21 1. Those galaxies whose Ha 
wavelength Gaussians exceed 5 X 10 — 4 are red, and so on for the other three line identities. A galaxy with a single line observed at 8425 A 
(X) falls in the middle of a locus of red points. Therefore, this line is identified as Ha. In reality, each training set point is a fixed-volume, 
four-dimensional Gaussian whose width is given by errors in observable quantities. The sum of all the Gaussians in a particular line 
category at the location of the single-line galaxy is the probability that the single line also falls into that category. The two suppressed 
axes are R magnitude and observed wavelength. Right: The same figure for a galaxy with different colors and wavelength. In the absence 
of additional information, the identity of the single line cannot be determined because galaxies from multiple line identities populate the 
color-color space around the single-line galaxy. Both panels: The sharp color-color cut in Fields 2-4 ensures that only Field 1 galaxies 
populate the upper left corner of the diagram. Note that zn a = 0.28 in the left panel and zn a = 0.12 in the right panel. The second panel 
has fewer Ha points because the differential volume of the Universe is smaller at the second zn a . 

ally find different local minima. The solution is to use 
the average of the results of a committee of separately 
initialized ANNs. We use committees of 20 ANNs iden- 
tical in every way except for their initially randomized 
weights. 

In analogy to the PSP method, we call the ANN out- 
puts P x , which are subjected to the seven conditions 
described in §[3] Line j is identified conclusively only if 
a single P x or Pq 1u + P^g exceeds pann- 




Fig. 2. — The 4:4:4:4 artificial neural network architecture used 
for determining the identities of single emission lines. Triangles 
represent the observed wavelength and three broadband filter in- 
puts; squares represent the probabilities of each line identity; and 
circles represent hidden nodes. 



trum of a galaxy of known redshift is assigned a proba- 
bility of 1 while the other probabilities are 0. In addition 
to providing B, R, and / measurements as input to the 
ANN, we also provide the wavelength of the single line. 
Although there is danger in using irrelevant data as in- 
put to an ANN, we justify the use of line wavelength by 
remarking that the redshift and hence BRI magnitudes 
and colors of a galaxy would be different if the single line 
were Ha at observed wavelength Ai rather than a differ- 
ent A 2 . In full detail, the ANN architecture is 4:4:4:4, 
meaning 4 inputs, 4 outputs, and 2 hidden layers with 4 
nodes each, shown diagrammatically in Fig. O We find 
that other similar architectures do not affect line identi- 
fication significantly. 

In determining the functional relationship between in- 
puts and outputs, an ANN finds the optimal configura- 
tion of weights to connect each node. Before training 
begins, the ANN is initialized with random weights. For 
this reason, ANNs with the same architecture and same 
training sets but with different initializations will usu- 



4. ACCURACY 

Although setting aside evaluation sets reduces the 
training set sizes and hence reduces precision, evalua- 
tion establishes confidence in the line identifications. In 
§ 13.1 1 we identified one line in each known-redshift galaxy 
to serve as the surrogate "single" line. In this section, 
we subject each evaluation galaxy to both methods and 
report the estimated identity of each line. 

We choose ppsp = 0.9 and pann = 0.8, motivated by 
the arguments below. Tables [T][3] detail the results sepa- 
rated by field. Each column is an actual surrogate single 
line identity, and each row is a result from the identi- 
fication algorithm. Correct identifications are shown in 
bold, where we consider the [Oin], H/3, and "[Olll] or 
H/3" categories to be correct for both [O in] and IL3 lines. 
(If we assume the rest wavelength of the single line is the 
average of 4861 A and 5007 A, then the redshift will be 
skewed by 1.5%, which is more precise than even the best 
photo-zs.) 

The prior conditions in §|3] improve the results signif- 
icantly. The priors corrected 465 identifications in the 
PSP method, mostly [Om] and [On], and 597 identifi- 
cations in the ANN method, overwhelming [Oin]. The 
large majority of the identifications corrected to Ha and 
[O n] were inconclusive before the application of the prior 
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TABLE 1 

Evaluation set accuracy, all fields. 



PSP Method 



Actual Line identity 



Identified As 


Ha 


[Om] 


Up 


[Oil] 


Total 


Ha 


201 


15 


2 


2 


220 


[Om] 


14 


200 


8 


4 


226 


H/3 


3 


36 


40 


4 


83 


[On] 











1896 


1896 


[O in] or H/3 


28 


571 


365 


25 


989 


Inconclusive 


63 


79 


9 


567 


718 


Total 


309 


901 


424 


2498 


4132 



ANN Method 

Actual Line Identity 



Identified As 


Ha 


[Om] 


H/3 


[On] 


Total 


Ha 


226 


11 


2 


1 


240 


[Om] 


14 


545 


32 


5 


596 


H/3 





18 


60 


1 


79 


[On] 











2184 


2184 


[O in] or H/3 


8 


217 


316 


16 


557 


Inconclusive 


61 


110 


14 


291 


476 


Total 


309 


901 


424 


2498 


4132 



TABLE 2 
Evaluation set accuracy, Field 1. 



PSP Method 



Actual Line Identity 



Identified As 


Ha 


[Om] 


H/3 


[On] 


Total 


Ha 


166 


8 


2 


2 


178 


[Om] 


12 


106 


6 


1 


125 


H/3 





20 


17 





37 


[On] 











372 


372 


[O m] or H/3 


19 


223 


133 


7 


382 


Inconclusive 


50 


62 


4 


96 


212 


Total 


247 


419 


162 


478 


1306 



ANN Method 
Actual Line Identity 



Identified As 


Ha 


[Om] 


H/3 


[On] 


Total 


Ha 


179 


5 





1 


185 


[Om] 


8 


215 


15 





238 


H/3 





11 


22 


1 


34 


[On] 











432 


432 


[O m] or H/3 


5 


105 


113 


3 


226 


Inconclusive 


55 


83 


12 


41 


191 


Total 


247 


419 


162 


478 


1306 



conditions. Those corrected to [Om] were mostly "[Om] 
or H/3" or inconclusive, and those corrected to H/3 were 
mostly "[Om] or H/3." The prior conditions work for 
many [O III] lines in the ANN method because the Po in 
values for those lines are high even before the application 
of the prior conditions. In the PSP method, even though 
the prior conditions can eliminate H/3 as a possibility, 
Pom is not large enough for a conclusive identification. 
Our choice of a smaller pann than ppsp creates this par- 
ticular success of the ANN method. 

Both methods show no large statistical difference be- 
tween the fields with different selection functions. Field 
1 contains more actual Ha lines than Fields 2-4, but 
the accuracy with which they are identified is about the 
same. This similarity between the fields justifies using 
one known-redshift set instead of one for Field 1 and a 
different one for Fields 2-4. 



TABLE 3 

Evaluation set accuracy, Fields 2-4. 



PSP Method 



Actual Line Identity 



Identified As 


Ha 


[Om] 


H/3 


[On] 


Total 


Ha 


35 


7 








42 


[Om] 


2 


94 


2 


3 


101 


H/3 


3 


16 


23 


4 


46 


[On] 











1524 


1524 


[O in] or H/3 


9 


348 


232 


18 


607 


Inconclusive 


13 


17 


5 


471 


506 


Total 


62 


482 


262 


2020 


2826 



ANN Method 

Actual Line Identity 



Identified As 


Ha 


[Om] 


H/3 


[On] 


Total 


Ha 


40 


5 


2 





47 


[Om] 


6 


334 


16 


3 


359 


H/3 





11 


41 





52 


[On] 











1751 


1751 


[O in] or H/3 


5 


105 


202 


19 


331 


Inconclusive 


11 


27 


1 


247 


286 


Total 


62 


482 


262 


2020 


2826 



TABLE 4 

Comparison between PSP and ANN methods on evaluation set. 



ANN Method 



PSP Method 


Ha 


[Om] 


H/3 


[On] 


[OinJ/H/3 


inc. 


Total 


Ha 


182 


3 








3 


32 


220 


[Om] 


5 


149 








42 


30 


226 


H/3 


4 


12 


26 


1 


28 


12 


83 


[On] 











1839 


1 


56 


1896 


[O in] or H/3 


11 


411 


52 


1 


464 


50 


989 


Inconclusive 


38 


21 


1 


343 


19 


296 


718 


Total 


240 


596 


79 


2184 


557 


476 


4132 



The ANN method is superior to the PSP method in 
both accuracy and number of conclusive identifications. 
The PSP method in particular identifies 9.1% of the ac- 
tual Ha lines as "[Om] or H/3" compared to only 2.6% 
for the ANN method. Interestingly, 8.6% and 5.8% of the 
lines identified as Ha in the PSP and ANN methods are 
incorrect, despite the underrepresentation of Ha lines in 
Fields 2-4. Not surprisingly, the PSP method performs 
poorly at identifying [O m] and H/3. However, the ANN 
method performs very well at identifying [Om] (91.4% 
correct identifications) because the prior conditions cor- 
rect so many identifications to [Om]. Every line classi- 
fied as [On] in both methods is correct, largely because 
of condition (7) in §E1 The "[Om] or H/3" category is 
94.6% and 95.7% accurate for the PSP and ANN meth- 
ods respectively, but the ANN method conclusively and 
correctly identifies many more lines as [O III] . 

Table |4] shows the number of lines from the evaluation 
set that were identified the same and differently between 
the two methods. The numbers of lines that fell into the 
same category in both methods are shown in bold. Very 
few conclusive identifications arc different between the 
two methods. 

We find the optimal accuracy parameter p for each 
method by attempting to maximize accuracy while min- 
imizing the number of inconclusive identifications. We 
classify accurate Ha, [On], and "[Om] or H/3" identifi- 
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Fig. 3. — Top: Fraction of total evaluation set galaxies correctly, incorrectly, and inconclusively identified in both methods versus the 
accuracy parameter p. We allow the [Om], H/3, and "[Oin] or H/3" identifications to be "correct" as long as the actual line is [Om] or H/3. 
Bottom: The ratio of correct identifications to the sum of correct and incorrect identifications, as defined in the top panel. 



cations as "correct." Individual [Om] and H/3 identifica- 
tions are correct if the actual line is either [0 in] or H/3. 

Lines where P ] x < p for all X are "inconclusive" except 
where m + P|L > p. Inaccurate Ha, [Oil], and "[Om] 
or H/3" identifications are "incorrect," as well as individ- 
ual [0 in] and H/3 identifications where the actual line is 
Ha or [O n]. Figure [3] shows the results for each method. 
We find that ppsp = 0.9 and pann = 0.8 give ratios of 
correct to the sum of correct and incorrect results near 
0.98 without sacrificing many conclusive identifications. 

5. RESULTS ON SINGLE-LINE GALAXIES 
5.1. Single Line Identification 

We apply both methods to the 640 truly single emission 
lines. Table[5]lists the fraction of single lines identified in 
each category separated by field. A comparison of each 
column in this table to the "Total" columns in Tables [T][3] 
immediately shows that the single-line population con- 
tains fewer [O n] lines because [O il] is often resolved and 
identifiable in deep2 spectra. It can be a single line only 
when the galaxy's internal velocities blend the doublet. 

Of the four observables used to determine line iden- 
tities, the two colors are the most powerful. Figure @] 
shows the BRI color-color plane for the lines identified 
in both methods. It is useful to compare this figure with 
Fig. [U Conclusive line identifications are possible in re- 
gions where the training set categories overlap because R 
magnitude and observed wavelength also help to deter- 
mine line identity and, more importantly, because the 
seven conditions described in §[3] rule out certain line 
identities, leaving one dominant identity. 

In contrast to the evaluation set, the priors change a 
large fraction of the single-line set identifications. In the 
PSP method, the priors changed the identifications of 517 



TABLE 5 
Single line identification. 



PSP Method 



Identified As 


Field 1 


Fields 2-4 


All Fields 


Ha 


51 


60 


111 


[Om] 


16 


16 


32 


H/3 


12 


13 


25 


[On] 


9 


50 


59 


[0 in] or H/3 


85 


89 


174 


Inconclusive 


91 


148 


239 


Total 


264 


376 


640 


ANN Method 


Identified As 


Field 1 


Fields 2-4 


All Fields 


Ha 


79 


87 


166 


[Om] 


55 


47 


102 


H/3 


4 


11 


15 


[On] 


21 


73 


94 


[O in] or H/3 


40 


55 


95 


Inconclusive 


65 


103 


168 


Total 


264 


376 


640 



(80.8%) of the single lines, largely from H/3 to "[Oin] or 
H/3" or from [O n] to inconclusive. In the ANN method, 
priors changed the identifications of 285 (44.5%) of the 
single lines, mostly from [On] or inconclusive to Ha, 
[Om], or H/3. While these large fractions may seem to 
diminish the power of the core PSP and ANN methods, 
the priors eliminate only one line in most cases. It is still 
up to the core algorithm to choose among the remaining 
three lines based on broadband photometry and observed 
wavelength. 

Table[S]shows the number of single lines that were iden- 
tified the same and differently between the two methods. 
Conclusive line identifications do not change significantly 
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Fig. 4. — BRI color-color diagrams for the 640 single-line galaxies in all four fields color-coded by their identifications in both methods. 
Points that represent the "[Om] or H/3" and "inconclusive" identifications are smaller so that they do not obscure the other points. 



TABLE 6 

Comparison between PSP and ANN methods on single-line 

GALAXIES. 

ANN Method 



PSP Method 


Ha 


[Om] 


H/3 


[On] 


[0 in] /H/3 


inc. 


Total 


Ha 


100 














11 


111 


[Om] 


1 


22 











9 


32 


H/3 


3 


1 


11 








10 


25 


[On] 











58 





1 


59 


[O m] or H/3 


1 


61 


4 





87 


21 


174 


Inconclusive 


61 


18 





36 


8 


116 


239 


Total 


166 


102 


15 


94 


95 


168 


640 



TABLE 7 
Recovered redshifts. 



PSP Method 



Actual Line Identity 



Identified As 


Ha 


[Om] 


H/3 


[On] 


Total 


Ha 


5 











5 


[Om] 





2 








2 


H/3 

















[On] 











9 


9 


[O in] or H/3 


1 


3 


3 


1 


8 


Inconclusive 


2 


5 





2 


9 


Total 


8 


10 


3 


12 


33 



ANN Method 



Actual Line Identity 



Identified As 


Ha 


[Om] 


H/3 


[On] 


Total 


Ha 


6 


1 








7 


[Om] 





4 








4 


H/3 

















[On] 











10 


10 


[O m] or H/3 





2 


3 





5 


Inconclusive 


2 


3 





2 


7 


Total 


8 


10 


3 


12 


33 



between the two methods. 

During the visual inspection of all single-line candi- 
dates, we identified the redshifts of 33 galaxies using 
additional lines that previous inspectors missed because 
they were dim or nearly hidden. Table [7] summarizes the 



results. The PSP method identifies 22 lines correctly, 
2 incorrectly, and 9 inconclusively. The ANN method 
identifies 25 lines correctly, 1 incorrectly, and 7 inconclu- 
sively. Because these galaxies with recovered redshifts 
are so similar to the other single-line galaxies, their high 
success rate lends credence to both algorithms. 

5.2. Spectral Coaddition 

A useful statistical check is to coadd the one- 
dimensional spectra of all of the galaxies identified in 
each category. We shift all the spectra to their rest 
frames and normalize them such that the median num- 
ber of counts in a pixel is 1. Then, we coadd them 
with inverse variance weighting. The coadded spec- 
tra are smoothed with a Gaussian window function of 
a = 2.5 pixels to approximate the instrumental resolu- 
tion. If a line is identified properly, other spectral fea- 
tures associated with the single line should emerge above 
the noise. On the other hand, if the line is misidentified, 
some spectral features associated with other lines may 
be present. Although this technique cannot identify in- 
dividual lines, it can give an idea of overall success or 
failure for each identification category. 

Figure [5] shows the coadded spectra for the galaxies 
identified in each of the six categories. Emission and 
absorption features shown at four different redshifts — 
corresponding to the four allowed identities of each sin- 
gle line — are marked on each spectrum. The fourteen 
features are [On] A3727, CaH A3934, CaK A3969, fpy 
A4341, H/3 A4861, [Om] AA4959, 5007, Mgb A5173, NaD 
A5893, [Nil] AA6548, 6583, Ha A6563, and [Sn] AA6716, 
6731. Bold symbols direct attention to potentially real 
spectral features. 

Both methods show [N n] and [S n] in the Ha spectra; 
[O III] A4959 and H/3 in the [O III] spectra; CaK in the 
H/3 spectra; and CaH, CaK, and H7 as well as high-order 
Balmer absorption in the [0 11] spectra. The [0 11] lines 
are broad, as expected. The "[Om] or H/3" spectra for 
both methods display CaH, CaK, Hy, and [O III] A5007 
associated with H/3. The PSP method spectrum may 
also contain [Om] A4959 associated with [Om] A5007. 
Finally, the inconclusive category contains few convinc- 
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Fig. 5. — The coadded spectra of lines identified in each of the six categories. Each spectrum is labeled in Angstroms and in the rest 
frame of the identified line. "[Oio] or H/3" is in the rest frame of [Oin] A5007, and "inconclusive" is in the rest frame of Ha A6563. The 
symbols indicate spectral features at four different redshifts: red if the single line is Ho, orange for [Om], green for H/3, and blue for 
[On]. Arrows represent emission lines, and pluses represent absorption lines. Bold symbols draw attention to noticeable features, and thin 
symbols mark the locations of absent features. The peaks of the single lines are stronger than the maximum plotted flux. 
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Fig. 5. — continued 
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ing lines, but features associated with Ha and [0 in] may 
be present. The presence of expected lines and absence of 
others in conclusively identified single-line galaxy spec- 
tra strengthens credibility in the identification of single 
emission lines. 

6. CONCLUSIONS 

The methods presented here combine photometry 
spectroscopy, and physically motivated arguments about 
line flux ratios to determine the identity of single emis- 
sion lines in galaxy spectra. The resultant redshifts seem 
very accurate. The parameter space proximity and neu- 
ral network methods identify 82.6% and 88.5% of the 
lines in the evaluation set, and for the neural network 
method, over 98% of those identifications are correct. 
The spectral resolution of DEIMOS makes both methods 
nearly perfect at identifying [O n] , but they make more 
mistakes in identifying Ha (8.6% and 5.8% failure rates). 
The methods identify [O in] and H/3 lines as one of the 
two with 6.0% and 3.6% failure rates. Even this am- 
biguous identification can give a redshift more precise 
than present state-of-the-art photo-zs. Remarkably, the 
neural network method correctly identifies well over half 
of the [O ill] lines without this ambiguity. The parame- 
ter space proximity method recovers redshifts for 62.7% 
of the 640 single-line galaxies, and the neural network 
method recovers redshifts for 73.8% of the sample. Over- 
all, the neural network method seems superior in both 
accuracy and recovery rate. 

Identifying single emission lines is important to fu- 
ture work in deep2. The survey contains about 1,000 
serendipitously detected galaxies ( "serendips" ) , many of 
which display only one emission line. The line identifi- 
cation algorithms may be applied to these objects and 
double the number of identified single lines. (One con- 



cern is that the serendips will occupy a region of pa- 
rameter space not populated by deep2 targets in the 
training set.) Additionally, the wealth of data in Field 1, 
or the EGS, can increase accuracy and reduce inconclu- 
sive identifications with supplemental broadband mea- 
surements or morphological parameters, especially angu- 
lar sizes, from high-resolution images. EGS photo-zs cal- 
culated without spectroscopic information can also con- 
strain redshifts enough to identify single lines. Further- 
more, all four fields also contain more information than 
we use in this work. For example, surface brightness, 
angular size, and the galaxy-galaxy correlation function 
all contain information about redshift. deep2 catalogs 
these three observables, which may be implemented in 
both single line identification methods. Finally, although 
many single- line galaxy spectra display very faint contin- 
uums, cross-correlations with the continuums of known- 
redshift galaxies or templates can reveal more redshift 
information. We plan to address these possibilities in 
future work. 
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